11 research outputs found

    An Exploratory Study on Code Attention in BERT

    Full text link
    Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts results in many downstream tasks such as code summarization and bug detection, they are based on Transformer and PLM, which are mainly studied in the Natural Language Processing (NLP) field. The current studies rely on the reasoning and practices from NLP for these models in code, despite the differences between natural languages and programming languages. There is also limited literature on explaining how code is modeled. Here, we investigate the attention behavior of PLM on code and compare it with natural language. We pre-trained BERT, a Transformer based PLM, on code and explored what kind of information it learns, both semantic and syntactic. We run several experiments to analyze the attention values of code constructs on each other and what BERT learns in each layer. Our analyses show that BERT pays more attention to syntactic entities, specifically identifiers and separators, in contrast to the most attended token [CLS] in NLP. This observation motivated us to leverage identifiers to represent the code sequence instead of the [CLS] token when used for code clone detection. Our results show that employing embeddings from identifiers increases the performance of BERT by 605% and 4% F1-score in its lower layers and the upper layers, respectively. When identifiers' embeddings are used in CodeBERT, a code-based PLM, the performance is improved by 21-24% in the F1-score of clone detection. The findings can benefit the research community by using code-specific representations instead of applying the common embeddings used in NLP, and open new directions for developing smaller models with similar performance.Comment: Accepted in ICPC 202

    Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights

    Full text link
    The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings.Comment: Accepted @ IEEE/CVF WACV 202

    Source code representation for comment generation and program comprehension

    No full text
    Code comment generation is the task of generating a high-level natural language description for a given code snippet. Comments help software developers maintain programs; however, comments are mostly missing or are outdated. Many studies develop models to generate comments automatically, mainly using deep neural networks. A missing point in the current research is capturing each character's information and the syntactic differences of tokens. Moreover, the contextual meaning of code tokens is generally overlooked. In this thesis, we present LAnguage Model and Named Entity Recognition Code comment generator (LAMNER-Code). A character-level language model is used to learn the semantic representation and a Named Entity Recognition model is trained for learning the code entities. These representations are used in a Neural Machine Translation architecture to produce comments. We evaluate the generated comments using our model and other baselines against ground truth on a Java dataset with four standard metrics, BLEU, ROGUE-L, METEOR, and CIDEr, which are improved by 3.26, 5.27, 1.25, and 0.1 points, respectively. The existing techniques and our proposed work are complementary to each other. Experiments on abstracted code further demonstrate the value of the LAMNER-Code embeddings. The human evaluation confirms the quality of LAMNER-Code comments compared to baselines and the reference comments. Also, the new decoder sampling strategy presented in this works can better recall identifiers during comments generation. Despite the improvement in the performance, we see performance from Transformer based models is comparable to this work. Therefore, we conduct an additional exploratory study to understand source code comprehension by the Transformer based models. The findings from this study reveal some similarities with the natural languages and also present differences in the attention of the Transformer-based language model for source code. Finally, we use the findings from this study to develop a new embedding for classification task based on identifiers which further improves the performance for code clone detection over the vanilla techniques which uses the 'CLS' token.Science, Irving K. Barber Faculty of (Okanagan)Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan)Graduat

    In Vitro and In Silico Studies for the Identification of Potent Metabolites of Some High-Altitude Medicinal Plants from Nepal Inhibiting SARS-CoV-2 Spike Protein

    No full text
    Despite ongoing vaccination programs against COVID-19 around the world, cases of infection are still rising with new variants. This infers that an effective antiviral drug against COVID-19 is crucial along with vaccinations to decrease cases. A potential target of such antivirals could be the membrane components of the causative pathogen, SARS-CoV-2, for instance spike (S) protein. In our research, we have deployed in vitro screening of crude extracts of seven ethnomedicinal plants against the spike receptor-binding domain (S1-RBD) of SARS-CoV-2 using an enzyme-linked immunosorbent assay (ELISA). Following encouraging in vitro results for Tinospora cordifolia, in silico studies were conducted for the 14 reported antiviral secondary metabolites isolated from T. cordifolia—a species widely cultivated and used as an antiviral drug in the Himalayan country of Nepal—using Genetic Optimization for Ligand Docking (GOLD), Molecular Operating Environment (MOE), and BIOVIA Discovery Studio. The molecular docking and binding energy study revealed that cordifolioside-A had a higher binding affinity and was the most effective in binding to the competitive site of the spike protein. Molecular dynamics (MD) simulation studies using GROMACS 5.4.1 further assayed the interaction between the potent compound and binding sites of the spike protein. It revealed that cordifolioside-A demonstrated better binding affinity and stability, and resulted in a conformational change in S1-RBD, hence hindering the activities of the protein. In addition, ADMET analysis of the secondary metabolites from T. cordifolia revealed promising pharmacokinetic properties. Our study thus recommends that certain secondary metabolites of T. cordifolia are possible medicinal candidates against SARS-CoV-2

    A narrative review on yoga: a potential intervention for augmenting immunomodulation and mental health in COVID-19

    No full text
    Abstract Background The ongoing novel coronavirus disease 2019 (COVID-19) pandemic has a significant mortality rate of 3–5%. The principal causes of multiorgan failure and death are cytokine release syndrome and immune dysfunction. Stress, anxiety, and depression has been aggravated by the pandemic and its resultant restrictions in day-to-day life which may contribute to immune dysregulation. Thus, immunity strengthening and the prevention of cytokine release syndrome are important for preventing and minimizing mortality in COVID-19 patients. However, despite a few specific remedies that now exist for the SARS-CoV-2virus, the principal modes of prevention include vaccination, masking, and holistic healing methods, such as yoga. Currently, extensive research is being conducted to better understand the neuroendocrinoimmunological mechanisms by which yoga alleviates stress and inflammation. This review article explores the anti-inflammatory and immune-modulating potentials of yoga, along with its role in reducing risk for immune dysfunction and impaired mental health. Methods We conducted this narrative review from published literature in MEDLINE, EMBASE, COCHRANE databases. Screening was performed for titles and abstracts by two independent review authors; potentially eligible citations were retrieved for full-text review. References of included articles and articles of major non-indexed peer reviewed journals were searched for relevance by two independent review authors. A third review author checked the excluded records. All disagreements were resolved through discussion amongst review authors or through adjudication by a fourth review author. Abstracts, editorials, conference proceedings and clinical trial registrations were excluded. Observations Yoga is a nonpharmacological, cost-effective, and safe intervention associated with several health benefits. Originating in ancient India, this vast discipline consists of postures (asanas), breathing techniques (pranayama), meditation (dhyana/dharana), and relaxation. Studies have demonstrated yoga’s ability to bolster innate immunity and to inhibit cytokine release syndrome. As an intervention, yoga has been shown to improve mental health, as it alleviates anxiety, depression, and stress and enhances mindfulness, self-control, and self-regulation. Yoga has been correlated with numerous cardioprotective effects, which also may play a role in COVID-19 by preventing lung and cardiac injury. Conclusion and relevance This review paves the path for further research on yoga as a potential intervention for enhancing innate immunity and mental health and thus its role in prevention and adjunctive treatment in COVID-19

    Abstracts of AICTE Sponsored International Conference on Post-COVID Symptoms and Complications in Health

    No full text
    This book presents the selected abstracts of the International Conference on Post-COVID Symptoms and Complications in Health, hosted from the 28th to 29th of April 2022 in virtual mode by the LR Institute of Pharmacy, Solan (H.P.)-173223 in Collaboration with AICTE, New Delhi. This conference focuses on the implications of long-term symptoms on public health, ways to mitigate these complications, improve understanding of the disease process in COVID-19 patients, use of computational methods and artificial intelligence in predicting complications, and the role of various drug delivery systems in combating the complications. Conference Title:  International Conference on Post-COVID Symptoms and Complications in HealthConference Sponsor: AICTE, New Delhi.Conference Date: 28-29 April 2022Conference Location: OnlineConference Organizer: LR Institute of Pharmacy, Solan (H.P.)-173223
    corecore